Tutorial: Fine-tuning VGG on CIFAR-10

One of the most common questions we get is how to use neon to load a pre-trained model and fine-tune on a new dataset. In this tutorial, we show how to load a pre-trained convolutional neural network (VGG), which was trained on ImageNet, a large corpus of natural images with 1000 categories. We will then use this model to train on the CIFAR-10 dataset, a much smaller set of images with 10 categories.

We begin by first generating a computational backend with the gen_backend function from neon. If there is a GPU available (recommended), this function will generate a GPU backend. Otherwise, a CPU backend will be used.

Note: VGG will not fit on a Kepler GPU, so here we use CPU backend for instructional purposes. If you are running on a Maxwell+ GPU, switch the backend to gpu below.


In [ ]:
from neon.backends import gen_backend

be = gen_backend(batch_size=64, backend='cpu')

We can inspect the backend via:


In [ ]:
print be

Defining the VGG Model

We begin by first generating our VGG network. VGG is a popular convolutional neural network with ~19 layers. Not only does this network perform well with fine-tuning, but is also easy to define since the convolutional layers all have 3x3 filter sizes, and only differ in the number of feature maps.

We first define some common parameters used by all the convolution layers:


In [ ]:
from neon.transforms import Rectlin
from neon.initializers import Constant, Xavier

relu = Rectlin()
conv_params = {'strides': 1,
               'padding': 1,
               'init': Xavier(local=True),
               'bias': Constant(0),
               'activation': relu}

Then, we can define our network as a list of layers:


In [ ]:
from neon.layers import Conv, Dropout, Pooling, GeneralizedCost, Affine
from neon.initializers import GlorotUniform


# Set up the model layers
vgg_layers = []

# set up 3x3 conv stacks with different number of filters
vgg_layers.append(Conv((3, 3, 64), **conv_params))
vgg_layers.append(Conv((3, 3, 64), **conv_params))
vgg_layers.append(Pooling(2, strides=2))
vgg_layers.append(Conv((3, 3, 128), **conv_params))
vgg_layers.append(Conv((3, 3, 128), **conv_params))
vgg_layers.append(Pooling(2, strides=2))
vgg_layers.append(Conv((3, 3, 256), **conv_params))
vgg_layers.append(Conv((3, 3, 256), **conv_params))
vgg_layers.append(Conv((3, 3, 256), **conv_params))
vgg_layers.append(Pooling(2, strides=2))
vgg_layers.append(Conv((3, 3, 512), **conv_params))
vgg_layers.append(Conv((3, 3, 512), **conv_params))
vgg_layers.append(Conv((3, 3, 512), **conv_params))
vgg_layers.append(Pooling(2, strides=2))
vgg_layers.append(Conv((3, 3, 512), **conv_params))
vgg_layers.append(Conv((3, 3, 512), **conv_params))
vgg_layers.append(Conv((3, 3, 512), **conv_params))
vgg_layers.append(Pooling(2, strides=2))
vgg_layers.append(Affine(nout=4096, init=GlorotUniform(), bias=Constant(0), activation=relu))
vgg_layers.append(Dropout(keep=0.5))
vgg_layers.append(Affine(nout=4096, init=GlorotUniform(), bias=Constant(0), activation=relu))
vgg_layers.append(Dropout(keep=0.5))

The last layer of VGG is an Affine layer with 1000 units, for the 1000 categories in the ImageNet dataset. However, since our dataset only has 10 classes, we will instead use 10 output units. We also give this layer a special name (class_layer) so we know not to load pre-trained weights for this layer.


In [ ]:
from neon.transforms import Softmax

vgg_layers.append(Affine(nout=10, init=GlorotUniform(), bias=Constant(0), activation=Softmax(),
                  name="class_layer"))

Now we are ready to load the pre-trained weights into this model. First we generate a Model object to hold the VGG layers:


In [ ]:
from neon.models import Model

model = Model(layers=vgg_layers)

Loading pre-trained weights

Next, we download the pre-trained VGG weights from our Model Zoo. Note: this file is quite large (~550MB). By default, the weights file is saved in your home directory. To change this, or if you have already downloaded the file somewhere else, please edit the filepath variable below.


In [ ]:
from neon.data.datasets import Dataset
from neon.util.persist import load_obj
import os

# location and size of the VGG weights file
url = 'https://s3-us-west-1.amazonaws.com/nervana-modelzoo/VGG/'
filename = 'VGG_D.p'
size = 554227541

# edit filepath below if you have the file elsewhere
_, filepath = Dataset._valid_path_append('data', '', filename)
if not os.path.exists(filepath):
    Dataset.fetch_dataset(url, filename, filepath, size)

# load the weights param file
print("Loading VGG weights from {}...".format(filepath))
trained_vgg = load_obj(filepath)
print("Done!")

In neon, models are saved as python dictionaries. Below are some example calls to explore the model. You can examine the weights, the layer configuration, and more.


In [ ]:
print("The dictionary has the following keys: {}".format(trained_vgg.keys()))

layer0 = trained_vgg['model']['config']['layers'][0]
print("The first layer is of type: {}".format(layer0['type']))

# filter weights of the first layer
W = layer0['params']['W']
print("The first layer weights have average magnitude of {:.2}".format(abs(W).mean()))

We encourage you to use the below blank code cell to explore the model dictionary!


In [ ]:

We then iterate over the layers in our model, and load the weights from trained_vgg using each layers' layer.load_weights method. The final Affine layer is different between our model and the pre-trained model, since the number of classes have changed. Therefore, we break the loop when we encounter the final Affine layer, which has the name class_layer.


In [ ]:
param_layers = [l for l in model.layers.layers]
param_dict_list = trained_vgg['model']['config']['layers']
for layer, params in zip(param_layers, param_dict_list):
    if(layer.name == 'class_layer'):
        break

    # To be sure, we print the name of the layer in our model 
    # and the name in the vgg model.
    print(layer.name + ", " + params['config']['name'])
    layer.load_weights(params, load_states=True)

As a check, the above code should have printed out pairs of layer names, from our model and the pre-trained vgg models. The exact name may differ, but the type of layer and layer number should match between the two.

Fine-tuning VGG on the CIFAR-10 dataset

Now that we've modified the model for our new CIFAR-10 dataset, and loaded the model weights, let's give training a try!

Aeon Dataloader

The CIFAR-10 dataset is small enough to fit into memory, meaning that we would normally use an ArrayIterator to generate our dataset. However, the CIFAR-10 images are 28x28, and VGG was trained on ImageNet, which ahs images that are of 224x224 size. For this reason, we use our macrobatching dataloader aeon, which performs image scaling and cropping on-the-fly.

To prepare the data, we first invoke an ingestion script that downloads the data and creates the macrobatches. The script is located in your neon folder. Here we use some python language to extract the path to neon from the virtual environment.


In [ ]:
import neon
import os
neon_path = os.path.split(os.path.dirname(neon.__file__))[0]
print "Found path to neon as {}".format(neon_path)

Then we execute the ingestion script below


In [ ]:
%run $neon_path/examples/cifar10_msra/data.py --out_dir data/cifar10/

Aeon configuration

Aeon allows a diverse set of configurations to specify which transformations are applied on-the-fly during training. These configs are specific as python dictionaries. For more detail, see the aeon documentation.


In [ ]:
config = {
    'manifest_filename': 'data/cifar10/train-index.csv',  # CSV manifest of data
    'manifest_root': 'data/cifar10',  # root data directory
    'image': {'height': 224, 'width': 224,  # output image size 
              'scale': [0.875, 0.875],  # random scaling of image before cropping
              'flip_enable': True},  # randomly flip image
    'type': 'image,label',  # type of data
    'minibatch_size': be.bsz  # batch size
}

We then use this configuration to create our dataloader. The outputs from the dataloader are then sent through a series of transformations.


In [ ]:
from neon.data.aeon_shim import AeonDataLoader
from neon.data.dataloader_transformers import OneHot, TypeCast, BGRMeanSubtract

train_set = AeonDataLoader(config, be)

train_set = OneHot(train_set, index=1, nclasses=10)  # perform onehot on the labels
train_set = TypeCast(train_set, index=0, dtype=np.float32)  # cast the image to float32
train_set = BGRMeanSubtract(train_set, index=0)  # subtract image color means (based on default values)

Optimizer configuration

For fine-tuning, we want the final Affine layer to be updated with a higher learning rate compared to the pre-trained weights throughout the rest of the network.


In [ ]:
from neon.optimizers import GradientDescentMomentum, Schedule, MultiOptimizer
from neon.transforms import CrossEntropyMulti
# define different optimizers for the class_layer and the rest of the network
# we use a momentum coefficient of 0.9 and weight decay of 0.0005.
opt_vgg = GradientDescentMomentum(0.001, 0.9, wdecay=0.0005)
opt_class_layer = GradientDescentMomentum(0.01, 0.9, wdecay=0.0005)

# also define optimizers for the bias layers, which have a different learning rate
# and not weight decay.
opt_bias = GradientDescentMomentum(0.002, 0.9)
opt_bias_class = GradientDescentMomentum(0.02, 0.9)

# set up the mapping of layers to optimizers
opt = MultiOptimizer({'default': opt_vgg, 'Bias': opt_bias,
     'class_layer': opt_class_layer, 'class_layer_bias': opt_bias_class})

# use cross-entropy cost to train the network
cost = GeneralizedCost(costfunc=CrossEntropyMulti())

Finally, we set up callbacks so the model can report progress during training, and then run the model.fit function. Note that if you are on a CPU, this next section will take long to finish for you to see results.


In [ ]:
from neon.callbacks.callbacks import Callbacks

callbacks = Callbacks(model)
model.fit(train_set, optimizer=opt, num_epochs=10, cost=cost, callbacks=callbacks)

You should see the network cost decrease over the course of training as it fine-tunes on this new dataset.

Customize for your own data

With neon it is easy to adapt this notebook to fine-tune on your own dataset -- most of the code above can be reused regardless of your specific problem. There just a few things to do:

  1. Organize your image dataset into a format that our macrobatching loader recognizes. Run our batch_writer.py script to generate the macrobatches. See Macrobatching in our documentation for more information.
  2. Modify the number of output units in the last Affine layer of the network to match the number of categories in your dataset.

Feel free to visit our github page or our documention if you have questions!